Image loading and storing as array

Image to tensor

The data_to_tensor function takes a list of image file paths as input, and returns a NumPy array that contains a tensor for each image in the input list. The function calls the image_to_tensor function on each input image file path using a list comprehension, and then uses the np.vstack function to stack the resulting tensors vertically into a single NumPy array.

The resulting NumPy array represents the input data for a deep learning model, where each row of the array corresponds to a single input example, and the columns of the array correspond to the features or dimensions of each example. In the case of image data, the columns of the array correspond to the pixel values of the images.

NOTE- img_to_tensor is used to convert the img to store the image in an array(tensor) and in the fixed size of 128x128 and then data_to_tensor store the array in form of a stack

NOTE- img_to_tensor is used to convert the img to store the image in an array(tensor) and in the fixed size of 128x128 and then data_to_tensor store the array in form of a stack

Display The Images

Create tensors and targets

Normalize the tensors

Normalization is a common preprocessing step in deep learning that rescales the input features to be in a similar range. Normalizing pixel values to lie between 0 and 1 is important because it helps the optimizer to converge faster and ensures that each feature contributes equally to the training process.

Normalization of tensors refers to the process of scaling the pixel values of an image to a common range, typically between 0 and 1 or -1 and 1. This is done to standardize the data so that the model can process it more effectively. Normalization is necessary because images can have varying pixel ranges depending on factors such as image resolution, brightness, and contrast. By normalizing the pixel values, the model can better understand the relationships between the pixels and more accurately learn patterns in the data.

there is no loss of data in the process of normalization. In fact, normalization is a technique used to scale the data between 0 and 1 without losing any information.

When converting image data to floating-point values, the pixel values are typically in the range of 0 to 255, with 0 representing black and 255 representing white. Dividing each pixel value by 255 scales the values down to the range of 0 to 1, which is a common range for neural network inputs.

One-hot encoding

One-hot encode the targets One-hot encoding is a technique used to represent categorical data in a format that is suitable for machine learning algorithms. It involves representing each category as a binary vector, where each element in the vector corresponds to a unique category. The element corresponding to the category is set to 1, while all other elements are set to 0.

For example, suppose we have a set of categories: {apple, banana, cherry}. In one-hot encoding, we represent each category as a vector as follows:

apple = [1, 0, 0] banana = [0, 1, 0] cherry = [0, 0, 1]

Converting target values to one-hot encoding is necessary when the target variable has categorical values, especially when building a multi-class classification model. In one-hot encoding, a column is created for each category value in the target variable, and the column for the respective category is marked as 1, while all other columns are marked as 0 for each sample. This representation helps the neural network to understand that the output is categorical and not continuous. In the case of multi-class classification, the output layer of the neural network has multiple neurons, each representing a class. The neuron with the highest value is considered as the predicted class for the input image.

Split the data

20% is for testing and rest for training

Model

Sequential models- are linear stacks of 'layers' where one layer leads to next,i.e., output of previos layer is input to the next layer.

model.add(Conv2D(128, (3, 3), input_shape=x_train.shape[1:]))- It add a 2D convolution layers with 128 filters with each of 3x3 matrix(in pixels), these filters are are use to extract feature like edge , corner detection. These low level features then can be combined iwth lore complex features to distinguish between classes of images.input_shape=x_train.shape[1:] - it tells the shape of each input sample which is quite imprant for the first layer. NOTE- We can use 32,64 or 256 filters and (5,5) or (7,7) kernel size.(HIT AND TRIAL)

LeakyReLU - is a type of activation function used in case of CNN to avoid the vanishing gradient problem.

After the using the 3,3 filters for convolution we nee to do maxpooling so that we can have "dimensionality reduction" In maxpooling we define a pool(filter) of a particular size (say 2,2) then for even 2x2 matrix max mvalue is selected for net layer. (AVERAGE POOLING ?) We are using maxpooling for a 2D layer as our convo layer is also 2D layer (Keras supports 1D and 3D ma pooling layer) We we not using zero padding as we are preserving spatial resolution .

Dropout - 25% of the input units to the Dropout layer will be randomly set to zero at each iteration, which helps to prevent overfitting by reducing the interdependence of the neurons and forcing the model to learn more robust features.

model.add(Conv2D(128, (3, 3))) - layer in the model to further extract more abstract features from the input images. The second Conv2D layer applies another set of filters to the output of the first MaxPooling2D layer, which can help to extract more complex and high-level features from the input images.

model.add(GlobalMaxPooling2D()) - is a global operation that extracts the most important feature from each feature map of the output tensor.Global pooling layers can be used in a variety of cases. Primarily, it can be used to reduce the dimensionality of the feature maps output by some convolutional layer, to replace Flattening and sometimes even Dense layers in your classifier.

Dense - dense layer is used to transform the features extracted by convolutional layers into a format that can be used for classification or regression. Dense layers perform a matrix multiplication operation on the input features, followed by an activation function, to produce a set of output values. Dense layers are often used in neural networks for tasks that require mapping input to output in a non-linear way, such as image classification 512 or 32 neurons.

The compile method is used to configure the learning process of a model. It requires the user to specify a loss function, an optimizer, and the evaluation metrics that are used to judge the performance of the model during training and testing. The choice of evaluation metrics depends on the specific problem and the goals of the model. The choice of evaluation metric should be based on the problem at hand and the specific goals of the model. For instance, accuracy might be a good choice if the classes are well-balanced and the goal is to have a high overall accuracy, while precision and recall might be more appropriate if there is class imbalance and the goal is to correctly classify the positive cases.

Callbacks

In Keras, callbacks are functions that can be applied at certain stages of the training process. They are passed to the fit() method using the callbacks argument, and allow you to monitor and modify the behavior of the model during training.

Some common examples of callback functions are:

ModelCheckpoint: Saves the model weights after every epoch or when the validation loss improves. EarlyStopping: Stops the training process if the validation loss stops improving after a certain number of epochs. ReduceLROnPlateau: Reduces the learning rate if the validation loss stops improving after a certain number of epochs. Using callbacks can help you to achieve better performance and avoid overfitting in your models.